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Abstract 

We perform a comparison of two jet clusterization algorithms. The 
first one is the standard Durham algorithm and the second one is a 
global optimization scheme, Deterministic Annealing, often used in 
clusterization problems, and adapted to the problem of jet identifica- 
tion in particle production by high energy collisions; in particular we 
study hadronic jets in WW production by high energy e + e~ scatter- 
ing. Our results are as follows. First, we find that the two procedures 
give basically the same output as far as the particle clusterization 
is concerned. Second, we find that the increase of CPU time with 
the particle multiplicity is much faster for the Durham jet clustering 
algorithm in comparison with Deterministic Annealing. Since this re- 
sult follows from the higher computational complexity of the Durham 
scheme, it should not depend on the particular process studied here 
and might be significant for jet physics at LHC as well. 

PACS Numbers: 13.87.-a, 13.38.Be, 05.10.-a, 45.10.Db 

1 Introduction 

The problem of clustering consists of optimal grouping of observed signal 
samples (set of features). In many circumstances one seeks the partition of 
the given set of features which minimizes a prescribed cost function. This 
function should embody the a priori knowledge on the geometrical aspects 
of the problem. By this way clustering is thus transformed in an optimiza- 
tion task. The main applications of clustering are in pattern recognition and 
signal compression. Here we address a different problem, i.e. the task of 
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partitioning particles produced in high energy e + e~ annihilation into a cer- 
tain number of cone regions, i.e. small solid angles. This is the well known 
problem of jet clustering in high energy physics. It arises from the need to 
relate energy and momentum of the cluster (jet) to the four-momentum of 
the underlying and unobservable parton. There are at the moment a few 
well established jet clustering algorithms, to be reviewed in section 2. While 
tuned to the particular problem at hand they were not studied from the 
point of view of the theory of clustering algorithms and are not generally 
considered as minimization problems of prescribed cost functions. We feel 
this is a matter to be further investigated and therefore in this letter we 
wish to raise a few questions: Can the existing jet algorithms result from 
some variational principle? Can the general theory of clustering algorithms 
be used to reduce the computational time of jet algorithms? In this letter we 
answer these questions by considering a specific algorithm, the Deterministic 
Annealing (DA) |], ||, to be reviewed in section |3|; using an analogy with 
statistical physics, DA relates the problem of clustering to that of finding 
the global minimum of a thermodynamical potential. In order to compare 
this method to the traditional ones we will study a specific issue, i.e. the 
production of jets in WW production in high energy e + e~ colliders. This ap- 
plication will be discussed in section £|. Our results are contained in section 
^ they show that the jet clustering algorithms and Deterministic Anneal- 
ing give similar performances, but the computational complexity of DA is 
considerably lower; the use of methods based on variational principles, such 
as DA, would be therefore certainly of interest in the analysis of hadronic 
final states at the future accelerators such as the Large Hadron Collider at 
CERN, where the multiplicities might be very high and the computational 
complexity extremely demanding for ordinary jet clustering algorithms. 



2 Jet clustering algorithms 

The most common jet clustering algorithms used in studies of e + e~ annihi- 
lation are the JADE ||, Durham |3J and Cambridge || algorithms^. The 
prototype of these clustering methods and the oldest one is the JADE al- 
gorithm. One considers all the possible pairs (i, j) of particles in the final 
state, with energies E iy Ej, and angular separation 9y and computes the jet 

1 For a review of these and other jet algorithms see ||. For a review of the Montecarlo 
generators and their connections with the jet algorithms see [Q. 
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resolution variable 



2E i E i (l - cos ft; 



where is the visible energy, i.e. the sum of energies for all particles ob- 
served in the final state. This test variable is then compared to a given thresh- 
old parameter y cut and the pair is recombined into a new pseudo-particle k 
of four-momentum 

Pk=Pi + Pj (2) 

(E scheme) provided that y^ < y cut . The algorithm is then reiterated 
to the new set of (pseudo)particles and it stops when, for all pairs, > 
y cut . The number of pseudoparticles at the end of the algorithm counts the 
number of jets, which is therefore fixed by y cut . The theoretical advantage 
of this recombination scheme lies in the absence of collinear and infrared 
singularities, as the regions of phase space where these divergences could be 
generated are automatically excluded. 

A drawback of the definition ([]]) is that also particles at very different 
angles can be recombined in one pseudoparticle. This is due to the fact that 
2EiEj(l — cos 6 ij) ~ Mfj, where is the invariant mass of the two particles. 
On the theoretical side this implies the presence of large order corrections 
that cannot be resummed; on the experimental side the trouble is that ghost 
jets may appear i.e. jets along directions where no particles are present. To 
cure this problem the Durham algorithm was introduced, which is based on 
the following definition of the test variable: 



d 2min{E?,E]}(l -cos%) 
Vij = Vij = ^2 ■ ( 3 ) 



^ WIS 



Clearly the resolution criterion y^ > y cut becomes, for small angles, k^ > 
Evis Vcnti where k Ti is the transverse momentum of the i— th particle to the 
direction of the j— th one. In this way the algorithm tries to minimize the 
transverse momentum and not the invariant mass. On the other hand the 
recombination scheme is still given by (fj). 

The Durham algorithm presents a problem at very small values oiy cut . As 
a matter of fact, when one tries to resolve the final state to get a larger number 
of jets, particles that are almost collinear can be recombined, thus producing 
unwanted junk jets. This feature is solved by the Cambridge algorithm by 
introducing a third step in the process of formation of the clusters. Before 
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considering the jet resolution and the recombination steps, one introduces an 
ordering variable = 2(1 — cos^). Once the pair (k, £) with the minimum 
value of is found, the resolution variable yu (still given by @) is computed 
and, after a comparison with y cut , the recombination scheme (0) is appliedP]. 

In this paper we are not interested in the analysis of small y cu t as we 
consider the production of a W pair in e + e~ scattering, i.e. a fixed (four) 
number of jets. In this context the Durham and the Cambridge algorithms 
produce analogous results. Therefore in the sequel we will mainly refer to the 
Durham algorithm. For a more detailed comparison among jet algorithms 
one can however see ||. 



3 Deterministic annealing 

Deterministic annealing algorithms |l] (for reviews see 0) take their name 
from the annealing procedure in physical chemistry. This process starts from 
a metastable state of the metal, reached by a sudden decrease of temperature; 
the annealing procedure consists in a gradual cooling by which the mineral 
passes from the metastable phase to the low temperature minimum of energy. 
During the annealing procedure the system passes through states of thermal 
equilibrium and, correspondingly, of minimal Helmholtz free energy F. In 
the limit of low temperatures one is therefore guaranteed that the system is 
in the global minimum and not in a local minimum of F. 

A computational method trying to simulate annealing was invented about 
twenty years ago ||, based on the Metropolis algorithm |10| . This method, 



called Simulated Annealing (SA) not only simulates annealing in its quest for 
the global minimum of the free energy, but also in its stochastic evolution; 
because of this last feature it can become rather time-consuming. This snag 
is avoided in Deterministic Annealing, the method we will use here. It is still 
an annealing method because it points to the global minimum of F and allows 
gradual cooling of the system, but it is deterministic since the procedure of 
optimization (see below) is obtained deterministically and not by a random 
process. In the sequel we shall describe DA in general terms, while in the 
next section we shall discuss the modifications we have implemented to adapt 
it to the particular problem of the determination of the four jets in WW 
production in e + e~ diffusion. 

2 The Cambridge algorithm implements the so called soft freezing, i.e. if the particles 
are not recombined the softer particle is removed and considered as a resolved jet. 
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To start with, one defines two sets, the set of the data points x G X and 
the set of the representative points y G Y, also called code — vectors, i.e. 
the points that eventually represent the clusters. One also defines a distance 
d(x, y) between the data point x and the code- vector y. In jet physics x = p^ 1 , 
i.e. the four momentum of one of the particles in the final state, while y is 
the four momentum of a cluster; as for the distance d(xi,Xj) we will take it 
coincident with y^, i.e. one of the basic distances in the jet physics. 

DA fixes the code-vectors by minimizing the Helmholtz free energy 

F* = -TJ2 In (Z e- d ^ T ) (4) 

x \ y / 

with respect to the code vector. Basic ingredient of the calculation is the use 
of F* as free energy; it is based on the use of the conditional probabilities 

e -d(x,y)/T 

p(y\x) = — 7} — (5) 

£>x 

with 

Zx = Y^e- d ^' T . (6) 

y 

It is well known that the Gibbs distribution (|5]) is the minimum point of the 
free energy, defined in general as 

F = D-TH . (7) 

Here the role of the energy is played by 

D = J2p(x,y)d(x,y) , (8) 

x,y 

where p(x, y) is the joint probability of x and y and the entropy is the 
Shannon entropy, i.e. 

h = - E p(x, y) in p(%, y) ■ (9) 

x,y 

By minimizing F in (|7|) one gets F* provided the Gibbs distribution is used; 
therefore the use of the probability (^|), together with the algorithmic search 
for a minimum of F* in the variables y, is tantamount to the quest of the 
global minimum of the Helmholtz free energy. 
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The optimal code-vectors y are obtained by solving the equations 

J2p(y\x)V y d(x,y) = , (10) 

X 

which is found by the above mentioned procedure of minimization of F*. 

During the cooling process (T — > 0) one encounters phase transitions 
that are signalled by an increase of the number of clusters. This shows the 
similarity between DA and the jet clustering algorithms, where the number 
of clusters can also increase by a reduction of the parameter y cut . This 
aspect is not particularly relevant here, since the number of jets is held 
fixed. Nevertheless we discuss it for two reasons. First, it allows to stress an 
important aspect of DA that we will modify in section [5] for the application 
in jet physics, i.e. the definition of the optimal code- vectors; second, it could 
be useful in other applications where the number of jets is not fixed a priori. 

To increase the number of clusters one starts with a high value of T; it can 
be shown that in the limit of very high temperatures the minimum condition 
flip] ) has one degenerate solution (all the y equal). This corresponds to one 
cluster and, in the statistical mechanics analogy, to a completely disordered 
phase, typical of a high temperature state. After computation of this unique 
value yi by (|H]), one performs a deterministic updating according to the 
formula 

yi = J2 x p( x )p(yi\ x ) ■ ( n ) 

X 

If a pre-chosen convergence test is satisfied, one decreases the temperature. 
One can show that there are phase transitions when a value yo is found such 
that 

det l-^C xlyo =0, (12) 

where C x \ y is the covariance matrix computed with the (posterior) conditional 
probability p(x\y). If a solution of ( |T2] ) is obtained for a certain critical 
temperature 

T = T 1 ^ 2 , (13) 

then the previous set of code- vectors corresponds no longer to a minimum of 
the free energy. Therefore one adds a new cluster (therefore there are now 
two code- vectors, yi, y 2 ) and the procedure of optimization is repeated. In 
general, instead of ( p|) one has 

v ' = — m — ' (14) 



6 



The algorithm continues until a pre-fixed temperature (or number of clusters) 
is reached. It is worth noticing that this procedure does not assign uniquely 
each data point to a cluster, because some points can remain unassigned |I| . 
Therefore, in the final step, one cools down the system (T — > 0) until all the 
particles are assigned. 



4 Deterministic annealing and W masses in 
e + e~ diffusion 

A W pair created in e + e~ annihilation produces in the subsequent evolution 
four jets; the study of this multi-jet final state is one of the methods for the 
determination of the W mass since the four jets can be divided into two pairs, 
each having as invariant mass m w . The world average for this parameter is 
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m w = (80.422 ± 0.047) GeV/c , (15) 

while the measurement at LEP, obtained combining both the hadronic and 
semileptonic channels, is |T2| 



m w = (80.450 ± 0.039) GeV/c 2 . (16) 

Given the exploratory character of this paper we are less interested in the 
prediction of the actual experimental data than in the comparison of the DA 
and jet algorithms; therefore we choose to work with Montecarlo generated 
data. The data set consists of about 1500 events produced at the LEP 



energies by the KORALW generator [|TI| , which includes all the four-fermion 
diagrams contributing to Vy + T4 7_ -like final states. It produces the primary 
reference sample, with a W mass of mw = 80.35 GeV (Fig.^J). The KORALW 
generator is interfaced with JETSET |14| for fragmentation. 



There is one modification to be implemented in the DA algorithm for 



its use in the present study. As we have mentioned above, by eq. (|T4|) one 
would extract the code-vectors. Eq. (|TJ) would produce in the present case 
a vector representative of the jet corresponding to the average 4-momentum 
in the jet J; however in all the jet clustering algorithms discussed in section 
[| the momentum of the single particle is compared to the total momentum of 
the jet J and not to the average momentum of the particles in the cluster J. 
In order to reproduce this feature we modify the DA algorithm by imposing 
only minimization in the probability distribution p(yj\x) and not also in 
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Figure 1: The W^-mass distribution of the actual data; mw hi GeV/c 2 . 



the variables y. In practice we use Eqns. 
particular we use, instead of (|14D, 

Vj = J2 x p(yj\x) 



but not ((TO]) and (ED' in 



(17) 



where p(yj\x) is still given by (|]). 

The practical implementation of the algorithm at fixed temperature con- 
sists of alternate updating of eqns. flTT| ) and (^), until convergence is reached. 
The temperature is then lowered and iteration process is restarted from the 
solution found at the previous temperature. Let us finally observe explicitly 
that when the temperature decreases below the critical value T 3 ^ 4 the num- 
ber of jets remains fixed (K = 4). Since we go to very small temperatures, 
T ~ 10 -3 , this means that the term —TH in ([?[) is negligible; therefore, the 
final clusterization corresponds in practice to a minimum of the cost function 



D = J2J2 d ( x ,Vk) 



(18) 



k=l x&Jk 
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where x G Jk means that the particle x belong to the jet Jk of code- vector y^. 
D assumes the form (|T8| ) because the probabilities p(x,yj) assume only the 
values 1 or 0, depending on the assignment of the particle of 4-momentum x 
to the cluster Jk or to another jet. 

5 Results and discussions 

Given the data set, be it simulated or real, one has to adopt a reconstruction 
criterion; we will use the following oneP|. Among the three possible pairings 
of the four jets J a : {(J u J 2 ), (J 3 , J 4 )}, {(J x , J 3 ), ( J 2 , J 4 )}, {{Ji, Ja), {J2, J3)}, 
with invariant masses (m 2 k, rri2 k +i) (k — 1,2, 3 for the three pairings), one 
chooses the one with the minimum value of 



This method tends to underestimate the W mass, a defect that could be 
corrected by different methods [[0| ; this analysis is however of no interest at 
the moment. Our results are presented in fig. 0. We plot the distribution 
of the jy-mass for the reconstructed data set obtained using the Durham 
algorithm (on the left) and the Deterministic Annealing algorithm (on the 
right). One can see that the two algorithms provide similar results, giving as 
average value of the W mass: mw = 79.08 GeV/c 2 for the Durham method 
and m w = 79.33 GeV/c 2 for DA. 

The study of the differences between the two algorithms can be made 
using a quantitative definition of the similarity between the two clustering 
procedures. We introduce the similarity parameter S by the formula: 



m 2 k ~ m 2k +i 



(19) 



n n 



E E 



min{Ei, EAd a i a 2 



S 



i=l j=i+l 



(20) 



n n 



E E 



min{Ei, Ej} 



i=l j=i+l 



where <Li n 2 is the Kronecker delta and 




(21) 



3 Other methods are discussed in |T 
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Figure 2: The VT-mass (in GeV/c 2 ) distributions as reconstructed by the 
Durham jet algorithm (left) and Deterministic Annealing (right). 

if the particles i, j belong to the same cluster as defined by the Durham 
algorithm, while 

a% = (22) 

otherwise; a?- is defined in a similar way, but with the cluster identified by 
DA. It is clear that S = 1 is equivalent to say that two particles are in 
the same cluster according to the Durham method if and only if they are 
in the same cluster also with DA; with 5 = this never happens. The 
factor min{Ei, Ej} gives lower weight to pairs with at least one low-energy 
particle. The histogram of the similarity S is shown in Fig.|3]: we find that 
a fraction of 87.8% of the events have S > 0.90, i.e. to a large extent they 
are clustered identically by the two algorithms. It may be useful to observe 
that the percentage of events with S > 0.80 is 99.0%. Therefore we can 



10 



0.8 0.85 0.9 0.95 1 

Similarity Durham-DA 

Figure 3: The distribution of the similarity S. 

conclude that not only the two algorithms give similar results as far as the 
average W mass is concerned, as we have already observed, but also that the 
composition of the jets provided by the two clustering procedures is identical 
in a very large fraction of the cases. 

The consequence of this analysis is twofold; on one side we are insured that 
in a large majority of cases the Durham algorithm provides a minimum of the 
cost function (|Tif ); therefore we can say that, to a large extent, the Durham 
algorithm is not only a local, but also a global clusterization procedure. 

On the other side, the results obtained so far show that the use of the 
DA algorithm might be seen as a practical alternative to the standard jet 
clusterization methods in the analyses involving huge computational tasks. 
This advantage may not be particularly important in the production of WW 
pairs at the LEP energies, but could be significant at higher energies. 

The reason why we expect a significant improvement at high multiplic- 
ities when using the DA algorithm lies in the computational complexity of 
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DA which is of the order N a (N = number of particles) with a > 1, in 
comparison with a > 2 for the Durham algorithm. As a matter of fact in the 
DA algorithm basically one performs one loop over the particles, see e.g. eq. 
(|4]) that contains a single loop over x = 1, N. On the other hand, in the 
case of the Durham algorithm there are two loops as shown by the discussion 
in section (computation of for i, j = 1, N). In both cases one expects 
values for a slightly larger than, respectively, a = 1, 2; as a matter of fact in 
the DA case there are convergence criteria at fixed temperature to be satis- 
fied; in the Durham case one performs other logical operations, in particular 
the reordering process, which also grows with N. 




Figure 4: CPU time (in seconds), vs. particle multiplicity for the Determin- 
istic Annealing algorithm (lower curve) and the Durham algorithm (upper 
curve). 

We have performed the analysis of the computational complexity of the 
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two algorithms for Monte Carlo generated e + e~ events of increasing mul- 
tiplicity. The results are reported in Fig. f| where we compare the CPU 
time (in seconds) for the DA algorithm (lower curve) and for the Durham 
algorithm (upper curve) versus the multiplicity N. 
A fit of the two curves of fig.|] gives, for large N, 

Deterministic Annealing : t oc iV 1 ' 83 , 

Durham algorithm : t oc N 2m . (23) 

These results are obtained by a processor Pentium IV 1700. 

6 Conclusions 

We have compared the results found by the standard Durham algorithm 
with those obtained by a clustering algorithm based on the Deterministic 
Annealing procedure. The latter is a minimization algorithm, often used in 
clusterization problems, and has been adapted to the process studied here 
i.e. jet identification in particle production by high energy collisions. In par- 
ticular we have addressed the study of the hadronic jets in WW production 
by high energy e + e~ scattering, but our results are rather general and should 
not depend on the specific reaction considered. They £1X6 cLS follows. First, 
we find that the two procedures give basically the same output as far as the 
clusterization is concerned. This means that one can interpret the Durham 
algorithm not only as a local computational scheme, but also as a global algo- 
rithm, i.e. a scheme attempting a minimization of a prescribed cost function. 
Second, we find that the CPU time of both algorithms increases with the par- 
ticle multiplicity, but the growth is much faster for the Durham jet clustering 
algorithm, which is a consequence of the higher computational complexity 
of the Durham scheme in comparison with Deterministic Annealing. This 
result might be significant for jet physics at future accelerators such as the 
Large Hadron Collider at CERN. 
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